Constructing stylistic synthesis databases from audio books
نویسندگان
چکیده
In this paper, we explore how to construct stylistic TTS databases from audio books, in which a storyteller performs multiple roles. The goal is to identify and build a set of speech corpora, each of which not only portrays a representative voice style performed by the speaker, but also has sufficient sentences to synthesize natural speech using unit selection approach. We solve the problem in two procedures: first, by representing each role with Gaussian Mixture Models (GMM), all speech data are partitioned into a number of voice style clusters with a criterion that maximizes the likelihood of all utterances with respect to roles’ speaker models; then, pruning in terms of both acoustic and prosodic measures is followed to purify the clusters. The resulting 4 voice styles are subjectively interpreted as Neutral, Young, Elder and Adult, respectively. Perceptual experiments show that the proposed approach can synthesize speech with the recognizable voice styles with an average 72.5% identification rate, and the synthesized speech sounds better than those synthesized with utterances from a single role.
منابع مشابه
Automatic Building of Synthetic Voices from Audio Books
Current state-of-the-art text-to-speech systems produce intelligible speech but lack the prosody of natural utterances. Building better models of prosody involves development of prosodically rich speech databases. However, development of such speech databases requires a large amount of effort and time. An alternative is to exploit story style monologues (long speech files) in audio books. These...
متن کاملمعیارهای ارزیابی و تولید کتابهای گویا از دیدگاه تولیدکنندگان: تحلیل محتوای کیفی
Purpose: Audio books have a special stand in the publishing industry. Publishers around the world produce audio books with different criterions and standards. This study aimed to identify and introduce the most important criterions for evaluation and production of audio books from the producers' point of view. Methodology: this study was performed with qualitative content analysis of interview...
متن کاملOverview of NIT HMM - based speech synthesis system for Blizzard Challenge 2012
This paper describes a hidden Markov model (HMM) based speech synthesis system developed for the Blizzard Challenge 2012. In the Blizzard Challenge 2012, we focused on a design of contexts for using audio books as training data and duration modeling of silence between sentences for synthesizing paragraphs. It is well known that contextual factors affect speech. We use extended contexts for usin...
متن کامل"TalkPrinting": Improving Speaker Recognition by Modeling Stylistic Features
Automatic speaker recognition is an important technology for intelligence gathering, law enforcement, and audio mining. Conventional speaker recognition systems, which are based on independent short-term spectral samples, suffer from a lack of noise robustness and are unable to model a speaker’s idiosyncratic stylistic features. This paper describes “TalkPrinting”, a program of research aimed a...
متن کاملOrdinal measures in authorship identification∗
The goal of this paper is to compare a set of distance/similarity measures, regarding theirs ability to reflect stylistic similarity between authors and texts. To assess the ability of these distance/similarity functions to capture stylistic similarity between texts, we tested them in one of the most frequently employed multivariate statistical analysis settings: cluster analysis. The experimen...
متن کامل